# Architecture Recognition
Sarashina2 Vision 14b
MIT
Sarashina2-Vision-14B is a large Japanese visual language model developed by SB Intuitions, combining Sarashina2-13B with Qwen2-VL-7B's image encoder, achieving excellent performance in multiple benchmarks.
Image-to-Text
Transformers Supports Multiple Languages

S
sbintuitions
192
6
Sarashina2 Vision 8b
MIT
Sarashina2-Vision-8B is a large Japanese vision-language model trained by SB Intuitions, based on the Sarashina2-7B and Qwen2-VL-7B image encoders, achieving excellent performance in multiple benchmarks.
Image-to-Text
Transformers Supports Multiple Languages

S
sbintuitions
1,233
4
Featured Recommended AI Models